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(57) Abstract: A method and system is described 
which allows users to identify (pre-recorded) sounds 
such as music, radio broadcast, commercials, and 
other audio signals in almost any environment. 
The audio signal (or sound) must be a recording 
represented in a database of recordings. The service 
can quickly identify the signal from just a few 
seconds of excerption, while tolerating high noise 
and distortion. Once the signal identified to the 
user, the user may perform transactions interactively 
in real-time or offline using the identification 
information. 
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METHOD AND SYSTEM FOR 
PURCHASING PRE-RECORDED MUSIC 

CROSS REFERENCE TO RELATED APPLICATIONS 
5 This application claims the benefit of U.S. Provisional Application No. 

60/222,023, filed July 31, 2000, the disclosure of which, and subsequently filed 
applications converted therefrom, is herein incorporated by reference. 

BACKGROUND OF THE INVENTION 

1 0 The present invention relates generally to methods and apparatuses for 

obtaining information about and/or purchasing pre-recorded music, and more 
particularly to a method and system for obtaining information about and/or 
purchasing pre-recorded music while listening to the music at any location. 

When listening to music, people often want to identify a song currently 

1 5 being played on an audio system, such as a radio, but can identify neither the title 
nor the artist. The listener may simply be interested in the artist, title, lyrics, genre, 
or other information about the music. The listener may also be interested in 
obtaining a copy of the music, i.e., purchasing the music. 

Frequently, the radio station announcer does not state the title, recording 

20 artist or other information about the song at the moment when the listener is 

interested in this information. Even if this information was announced, it may have 
been announced before the song was played and at that time the listener was not 
interested or was not then tuned to the station. The listener must then wait until 
hearing the song again and hope that the title and artist are announced at that time. 
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Even when information about the song is announced and heard, there are 
situations in which such information cannot easily be retained, such as when a 
listener is operating in a noisy environment, such as in an automobile, or when a 
listener does not have pen and paper at the ready. This situation presents problems 
5 to radio listeners and potential music purchasers and sellers alike. Market studies 
have shown that many radio listeners prefer radio stations that always announce the 
name of every song and artist. 

A significant number of radio listeners seeking to buy music at record stores 
are often unable to remember the name of a song or the recording artist by the time 

1 0 they enter a music store to purchase the music. In fact, the sheer number of music 
recordings available for purchase in a music store can be so imposing that many 
novice music purchasers do not venture into such stores to purchase music, despite 
wishing to purchase music. Music fans would buy more music if they had 
immediate information about the title of the song and artist as it is being played, 

1 5 such as on the radio or other location remote from typical retailing locations. 

Methods exist for automatically identifying music from a high-quality, 
relatively lengthy recording. For example, companies that monitor radio broadcasts 
to determine copyright and publishing royalties and to construct listings of current 
best-selling or most popular recording i.e., "top charts", employ certain techniques 

20 for identifying copyrighted songs from the broadcast. However, these methods 
require a high quality piece or excerpt of the song (referred to as a "sample") and 
are ineffective on short noisy samples. Usually, these methods require a clear signal 
that is a direct, high-quality connection to the radio output before it is broadcast to 
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prevent noise from corrupting the sample. Consequently, these methods cannot 
work in a noisy environment using short samples. 

Many times unidentified music is heard when riding in a car (or at another 
similarly inconvenient location). Moreover, when a listener decides he wishes to 
5 know the identity of a particular song being played, it is usually well into the song. 
Therefore, even if the listener were to begin recording the song at the moment he 
decides he wishes to know the identity of the song, the sample would be relatively 
short and possibly noisy depending upon the quality of the audio recording and the 
recording environment. Certainly, most listeners do not carry high quality recording 

10 equipment with them when traveling in a car. 

Moreover, even if the listener knows the identity of a song, as time passes 
the desire to obtain a copy of the song also passes. This is the so-called impulse 
purchase phenomenon, which is well known to retailers. The impulse purchase 
phenomenon is particularly strong where the listener has not heard the song before, 

15 and thus is unfamiliar with the title and/or recording artist. Unfortunately, there is 
currently no way for a music seller to take advantage of a potential impulse 
purchase resulting from a listener hearing a song (for perhaps the first time) in a car 
or other location that is remote from normal retail locations. 

The present invention is therefore also directed to the problem of developing 

20 a method and system for both identifying music and/or enabling music retailers to 
take advantage of impulse purchase phenomena. 
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SUMMARY OF THE INVENTION 

The present invention solves these and other problems through a method 
and system for providing a user with an ability to interactively engage with a 
service to trigger one or more of a variety of experiences, transactions or events by 
5 capturing a sample of an audio stream to which the user is exposed, and delivering 
that sample to the service. Such experiences, transactions and events, include 
purchases by the user, delivery of information from the service to the user, the 
execution of tasks and instructions by the service on the user's behalf, or other 
interactions that are responsive to the user's wishes. 

10 Thus, according to one exemplary embodiment of the present invention, a 

service utilizes a system for identifying songs and music from a relatively short 
sample and enabling a user to interact with the service to immediately purchase a 
recording of the identified music remotely. In this embodiment, a user captures a 
relatively short sample of music being played over a music playing device such as a 

1 5 radio, by placing a call with a mobile telephone using a predetermined telephone 
number and playing the music into the handset of the telephone. A system at the 
other end of the telephone identifies the song to the user in real-time (i.e., within the 
duration of the short call). Employing an interactive voice response unit ('TVR"), 
the service enables the user to immediately purchase a recording of the identified 

20 song. The present invention thus takes advantage of the impulse purchase 

phenomenon by providing a mechanism that guarantees that the user is at the height 
of interest in the particular music. Coupled with this mechanism is the capability of 
the user to purchase the music in a very short transaction to ensure the interest does 
not fade. 
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Other exemplary embodiments of the invention provide for pre-authorized 
purchases of identified songs and the detection of unauthorized use of copyrighted 
materials. 



5 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG 1 depicts an operational block diagram of an exemplary embodiment in 
accordance with one aspect of the present invention. 

FIG 2 depicts an operational block diagram of details of the signal 
identification block in FIG 1 in accordance with another aspect of the present 
10 invention. 

FIG 3 depicts an operational block diagram of details of the reporting and 
transaction block in FIG 1 in accordance with yet another aspect of the present 
invention. 

FIG 4 depicts an operational block diagram of an interaction between a user 
1 5 and interactive voice response unit in accordance with yet another aspect of the 
present invention. 



DETAILED DESCRIPTION 

At this point, it is worthy to note that any reference herein to "one 
20 embodiment" or "an embodiment" means that a particular feature, structure, or 
■ characteristic described in connection with the embodiment is included in at least 
one embodiment of the invention. The appearances of the phrase "in one 
embodiment" in various places herein are not necessarily all referring to the same 
embodiment. 
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The present invention includes a method and system for providing a user 
with an ability to interactively engage with a service to trigger a variety of 
• experiences, transactions, or events by capturing a sample of an audio stream to 
which the user is exposed and delivering that sample to the service. It is noted that 
5 the terms "service" and "service provider" and "system" as used herein include a 
service, service provider, and system, respectively, which employ the various 
principles of the present invention unless otherwise indicated. 

The audio stream can take any form where a message or information (i.e., 
content) is available for the user to experience, and may come from many sources, 
10 radio and television, pre-recorded audio, signals on internet and computer-based 
systems, telephones, and even live demonstrations or performances. Using a 
sampling device, such as an ordinary mobile (or cellular) phone, the user captures a 
. sample of the audio stream and transmits the sample to a service provider 
employing the present invention. 
15 The service provider may employ the sample by itself, may derive 

information from the sample, may use data known about the user (e.g., the user's 
identity and/or user profile), may prompt the user for additional information, or may 
employ a combination of all such inputs, to trigger an experience, transaction or 
event that is responsive to the user's needs. As described in more detail in the 
20 various embodiments of the invention below, such experiences, transactions and 
events include purchases of merchandise or services by the user, delivery of 
information from the service to the user, the execution of tasks and instructions by 
the service on the user's behalf, and other interactions. 
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An exemplary embodiment of the present invention provides a method and 
system for both identifying songs and music from a sample captured and 
transmitted using a mobile telephone, and providing a listener with an immediate 
opportunity to purchase the identified music. As mobile telephones are becoming 
5 . more prevalent in the marketplace, listeners are likely to have access to a mobile 
telephone when hearing a song they would like to identify. 

When desiring to obtain the identity or other information about a song being 
played, the user (i.e., listener) dials a predetermined telephone number using a 
mobile telephone (or any other available telephone, including landline, cordless, 

10 etc.), and then holds the telephone handset within audible distance of the source of 
the audio signal. The telephone number may be programmed into the phone by the 
user, or the number may be preprogrammed into the phone before its sold in some 
applications of the invention. 

A system at the other end of the telephone automatically answers the phone, 

15 compares the sample to those music recordings in its database and then identifies 
the song to the user. Using an IVR (or a live operator and/or a combination of IVR 
and live operator; all alternatives collectively referred to below as "IVR"), the user 
is then provided the opportunity to immediately place an order for a recording of 
the identified music, for example, an album on CD or tape that contains the selected 

20 track. 

The purchase may be physically fulfilled (i.e., forwarded to the user) in any 
number of different ways, such as regular mail, express mail, overnight delivery, 
courier, etc. Alternatively, digital delivery and fulfillment techniques may also be 



7 



WO 02/27600 



PCT/US01/29728 



used, including digital downloads to users' PCs, wireless internet devices, hand 
held personal computers, palm pilots, or mobile telephones. 

Other purchase options may also be provided to the user in accordance with 
other aspect of the present invention. In addition to being able to purchase the 
5 identified music, the user may purchase other merchandise (which could be related 
to the music or not). For example, after purchasing the album containing the 
identified song, the user may choose to purchase tickets to an upcoming 
performance by the song's artist or sending flowers to a loved one who shares that 
"special song". 

1 0 In another exemplary embodiment of the invention, the user can set up an 

account with the service where the user is enabled to select in advance to 
automatically purchase an identified music selection by simply calling a 
predetermined number. That is, the service treats a call to a dedicated number as an 
order for the identified music. According to this embodiment, when the user calls 

1 5 the predetermined number, the computer answers the telephone, identifies the user 
by the phone's ID, and, after the song selection is identified, a sales order is 
automatically placed. The user's account is then debited, and the selection is 
forwarded to the user. The selection could even be added to a compilation being 
created for the user from other similarly identified selections. 

20 For example, the system could keep adding identified songs to a personal 

compilation until the compilation reaches the limit of the recording media. Once 
reaching the limit of the recording media, such as a compact disk, the system would 
issue the media to the user. This system enables recording artists and record 
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companies to receive the proper royalties for their music while enabling listeners to 
create individual music compilations on any desired media. 

On some embodiments of the invention, the service relies upon a profile for 
the user that is created in advance of the call (but may be updated during a call as 
5 described below). A "user profile" may typically include general information about 
the user such as name, address, preferred method of payment (i.e., credit card pre- 
authorization), and set dollar limits on purchases. In addition, service-specific 
information regarding the user may also be included in the profile, such as 
demographic and user-identified preference information, to facilitate the service 
1 0 tailoring the transaction to fit a particular user. Moreover, with automatic telephone 
number identification, i.e., "caller ID", profiles can be built without prior user 
registration. 

Age, education, residence, gender, occupation, and personal interests, likes 
and dislikes, among other criteria, may be employed to most effectively match 

15 transaction offers to users' interests and purchase habits. For example, one 

particular customer of the service may have a user profile that indicates that the user 
is a member of a demographic group that is music-savvy and aware of music trends. 
After offering to sell a recording of the song selected by the user, the service could 
offer to sell a recording by an artist that is "moving up the charts" in popularity. 

20 Thus, by employing a user profile in some applications of the invention, a higher 
transaction closing rate may be realized as offers are more accurately targeted to 
users who may be predisposed to view the offer favorably. In one embodiment of 
this aspect of the invention, the system could play a sample of the other recording 
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over the telephone as part of the transaction, thereby ensuring that the user was 
interested in this recording. 

It is noted that user profiles are often dynamic and that the present invention 
includes an ability to update and change a user profile in response to changes in 
5 usage patterns (as described below), market and industry trends, user input, and 
other factors. 

A feature of the invention related to user profile is usage tracking. By 
tracking the frequency and time of access, the type of music sought to be identified, 
and purchase history, for example, of a user, the service can gain additional insight 

1 0 into factors which may influence a user. Patterns of usage may be derived which 
may allow predictive modeling to be utilized, for example, to enhance and refine 
service offerings. The system of the present invention can thus differentiate 
between repeat users (who heavily access the service) and casual users (who 
occasionally or infrequently use the service) and adjust the options, offers, and 

1 5 interactive scripting (as described below) so that interest and enthusiasm is 

maintained among the service's customers. It is contemplated that the user profile 
and usage tracking/pattern features described here may be used in other 
embodiments and applications of the inventions as well. 

There is an understanding of the fact that there may be some instances 

20 where recognition of an unidentified sound or song is not possible. In some cases 
this may be due to very short or noisy samples that challenge even the very capable 
technology platform of the present invention, or more likely, there is not an entry in 
the service's database that matches the sample even under the best conditions. 
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While mass storage of a vast library of recorded music and sounds is 
technically feasible, clearly it is not economical for a service provider to store data 
corresponding to every song or sound that exists in the world. Moreover, as barriers 
to publishing come down (through increased use of such distribution channels as 
5 the internet), it is expected that the rate at which new music is introduced to the 
public will continue to increase. As a result, it is expected that the entries to the 
service's database will always lag the newest (or most obscure) releases. 

The present invention contemplates that such unidentifiable samples may be 
advantageously utilized. Users are afforded the opportunity to "stump" the service, 

10 acquaintances, or other users. As evidenced by the popularity of radio programming 
where the disk jockey plays a small snippet of a song and then rewards the listener 
who correctly identifies the "mystery" song, many users of the present service are 
expected to embrace this enjoyable aspect of the invention. Accordingly, in some 
applications of the invention users may store, retrieve, and forward the captured 

15 • samples (for example, to other users). A service provider can thus implement 

contests, games, and promotions using the present invention to implement a modern 
day "Name that Tune." Such activities may be expected to heighten interest in the 
service as a whole. 

In another aspect of the invention, in addition to having the option of 

20 purchasing the identified song, the user may initiate other types of interactive 

experiences, events, or transactions with the system during a call. Such interactive 
experiences, events, and transactions may include purchases of merchandise or 
services, the receipt of information, the storage and manipulation of archived data, 
and an ability to command the system to perform desired tasks or instructions. 
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Thus, this aspect of the invention provides the user with a common and consistent 
interface with the service to perform multiple interactive transactions. 

For example, through the interactive voice response unit, the user may select 
from a number of options. An interactive voice response unit advantageously 
5 provides accurate and fast interactions between the user and the service and 
minimizes the amount of intervention by service personnel. Thus, playing a 
programmed audio script to the user, and receiving DTMF (dual tone multi- 
frequency or "touch") tones or voice commands from the user, the interactive voice 
response unit interacts with the user by providing the option to hear a clip of the 

10 identified recorded song or hear clips from other songs (e.g., tracks on the same 
album or from the same artist as the identified song, or music that is determined by 
the service to be of interest to the caller by a usage pattern or user profile). 

The user may be given the option of obtaining more information about the 
song, artist, or album. Examples of such information include song genre (e.g., folk, 

15 rock, R&B, country, rap, etc.) song lyrics, trivia, liner notes, production notes, 
instrumentation, musician identification, song position in music ranking services, 
such as Billboard®, and calendar of live appearances by the artist. Service-specific 
information may also furnished such as the tracks that the service identifies with the 
greatest frequency to the service subscribers on a given day. For some users, a 

20 desire to gain such information may be equal or stronger than the desire to know a 
song's title and artist. 

Data archiving and manipulation may be implemented as another interactive 
user option. This aspect of the invention provides for the user to save the 
identification and/or other information on a local storage medium for future access. 
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And, as noted above, users may wish to store the captured samples themselves. A 
typical voice mailbox system may be utilized to implement such functionality, or 
alternatively, a web-based interface may be established with user-allocated disk 
space on a local server. Such access may be gained using the same telephone 
5 interface used with song identification, or an alternative platform (i.e., one that is 
accessed by dialing a different number). 

Alternatively, saved data may be accessed and/or manipulated using such 
interfaces as a web browser with internet access. Such data archiving and 
manipulation feature of the invention may be used to implement service features 

10 that would, for example, allow users to store information on identified tracks and 
play them back on demand, or allow others to access the stored tracks. 

Additionally, users may create their own music compilations and share them 
with others, using for example, MP3 storage and access protocols. Other 
information input from a user may also be stored, for example, to annotate a track 

15 or compilation with the user's thoughts and comments. Captured samples may be 
stored and manipulated as well to facilitate contests and games (as described above) 
or to implement other functions that may require accessing the "raw" sample. 

Another aspect of the user and service interaction provided by the present 
invention includes an ability of the service to implement tasks and instructions 

20 • directed by the user. One notable example of this feature includes the connection of 
a live operator (i.e., customer service personnel) to the user upon command. This 
feature additionally allows a user to interact with others users, non-users, or 
potential users of the service, or a user may interact with other systems. 
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For example, the user may instruct the service to send a recommendation for 
the identified song to a friend via email or SMS or a clip of the song to the friend's 
mobile telephone or service-sponsored storage site (e.g., voice mailbox or allocated 
disk space). The user could also instruct the service to ship a recording by a 
5 featured artist, or other merchandise (both music-related and non-music-related 
merchandise), as a gift. Games, contests, and promotional events involving 
interactive user participation may also be readily implemented using this inventive 
feature. 

In addition, the service, acting as a portal or gateway or referring agent, may 
10 also provide the user with access to other systems, information, and services hosted 
or provided by third parties. 

Such interactive experiences may include the delivery of advertising or 
promotional messages to the user from the system. Such advertising may be general 
or targeted to a user based on a user profile. A user may also be directed (for 
15 example, via call forwarding, or via web link) to fulfillment partners of the service 
provider to either fulfill a purchase request or user-driven instruction in instances 
where such partners are used. Alternatively, direct links to fulfillment partners, 
advertisers, and other systems may be provided as an option to a user in themselves. 
In addition to the interaction that is implemented between the user and 
20 system during a call, the user may also be directed to other sources of information 
that are external to the system, for example, internet websites. Such websites may 
host content that is specifically related to the identified song, other music, or other 
topics of more general interest. The website may be configured to provide similar or 
additional interactive experiences to the user. 
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In another exemplary embodiment of the invention, a user experiencing a 
multimedia presentation, including an audio track, may capture and transmit a 
sample of content in the show to the service to indicate an interest in a product or 
service that may be featured. 
5 For example, a user may capture an audio sample of a commercial 

advertisement for a particular product or service from a broadcast TV presentation. 
Upon receipt and identification of the product or service from the captured sample, 
the service may provide a purchase option to the user, furnish more detailed product 
or service information, or other provide other optional interactions as described 
10 above. 

Fulfillment of the desired transaction to the user can be accomplished by the 
service directly, or through fulfillment partners as appropriate. This embodiment 
may be particularly beneficial when the advertiser does not include ordering or 
retail location information in the content of the advertisement. For example, a local 

15 retailer may not be known to the advertiser or it may be impractical for the 

advertiser to identify such retailers during the course of the advertisement (as in 
cases where the advertisement is distributed over a broad geographic region). Or, 
the advertiser may not equipped with all the necessary facilities (such as an IVR) to 
receive direct orders from consumers. 

20 Thus, this embodiment of the invention provides a benefit to both user and 

advertiser alike. For the user, the service implements a desired transaction that 
would be more difficult and less convenient if the user employed conventional 
means to achieve the same end. And, by locating and identifying users who are 
ready to purchase to advertisers, the service can increase product deliveries by 



15 



WO 02/27600 



PCT/US01/29728 



acting as a referral agent (by establishing a path of communication between the 
user/consumer and advertiser), or by acting as a "virtual" IVR for the advertiser 
(where the service provides the customer service interface between the 
user/consumer and the advertiser, and the advertiser acts as a defacto fulfillment 
5 partner of the service). 

The principles of this embodiment may also be readily applied to paid 
television programming (i.e., "infomercials") and home shopping broadcasts. In the 
case of paid television programming, product information segments are typically 
interspersed with segments that provide ordering information (during such 

10 segments, a viewer is generally provided product pricing information and directed 
to a toll-free telephone number to place an order with a live customer service 
representative). The content recognition technology employed by the present 
invention allows for faster, more accurate, purchases on impulse (even at times 
during the presentation when the toll-free number is not displayed) and provides an 

1 5 alternative ordering path to the 800 TVR. 

In addition, in cases where higher valued products are featured that normally 
require more deliberation before purchase decisions are made, users may desire 
more information to assist in their decision making but often be reluctant to interact 
with a live representative to obtain it (i.e., the user is not ready to buy and wishes to 

20 avoid being pressured into buying). By contrast, user/consumers employing the 
present common user interface feature would be comfortable in knowing that such 
additional information could be readily obtained without human contact as the 
interface is seamless and consistent across all experiences, events and transactions 
in accordance with the invention. 
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In the case of home shopping broadcasts, ordering information is generally 
displayed at all times during the broadcast. Consumers typically place orders via 
toll-free IVRs by identifying item numbers, color, size and other identification 
information for the products desired for purchase to the home shopping service. 
5 While such methods are satisfactory, use of the present invention can enhance the 
home shopping experience for many users through its interactive interface that 
facilitates the exchange of ordering information using the common user interface 
feature described above. 

In another exemplary embodiment of the present invention, the unauthorized 

10 use of copyrighted material may be detected. Recognition of copyrighted material 
without embedded watermarks is also provided. This aspect of the present invention 
enables an operator of a server to sample music recordings being transmitted over 
its server to ascertain whether the users are violating copyright laws. By identifying 
copyright violations at the server level, a web site operator can potentially avoid 

15 being accused of facilitating copyright infringement. 

Other techniques for obtaining the music sample may be employed due to 
the ability of the algorithm utilized in the present invention to process a relatively 
short sample and generate a positive identification in the presence of relatively large 
amounts of noise (both background noise in the user's environment and noise 

20 resulting from signal compression and/or impairments along the signal transmission 
path). 

For example, the music sample can be captured and sent to the system using 
many real-time sampling techniques. Thus, in various embodiments of the 
invention, a user may capture and transmit a sample using, for example, a standard 
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(i.e., wired) telephone, internet-streaming socket, or voice-over-IP transceiver. In 
other embodiments, a sample may be first recorded and then later transmitted to the 
system for identification, using for example, tape recording, CD writing, answering 
machine, digital voice recorder, uploaded WAV file, and the like. 
5 Thus, users of the present inventive system can employ telephony or data 

communications devices including ordinary or mobile telephones, PCs, internet 
access devices, wireless internet devices, personal data assistants ("PDAs"), 
wireless information devices, two-way pagers, and other devices that are equipped 
with an ability to capture a media sample in accordance with the principles of the 

10 present invention. 

The present invention provides tremendous advantages to users to facilitate 
the identification of songs anywhere and anytime. The identification occurs on a 
real-time basis, that is, the user can expect the information to be provided within the 
duration of a short transaction with (the service such as a one minute telephone 

1 5 call). The user does not need to wait to receive the desired information at a later 
time. The user simply and easily accesses an interactive voice response unit, dial up 
data interface or website, allows the microphone on the phone or PC to capture the 
music being played or performed anywhere for a few seconds, and, within seconds, 
learns the name of the song and the artist, as well as a range of additional 

20 information upon request. Users can then instantly purchase the music (in multiple 
formats), or save the information and access it later on a personalized web site 
(which may be configured, for example, to provide "personalized radio," i.e., 
content aggregation according to the user's taste, using streaming audio), or 
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perform several other actions from their telephone, internet access device, personal 
digital assistant (PDA), or PC. 

By providing for real-time identification of songs being played or performed 
anywhere, the present invention solves the problems identified above and provides 
5 significant advantages for both buyers and sellers of music. First, music buyers 
employing the present invention are provided with the song and artist information 
that they wish to know, as well as the ability to purchase a song or an entire album 
as soon as they hear it, i.e., at the peak of their interest in the song. Second, sellers 
of music, including record companies and retailers, will benefit from heightened 
10 impulse purchases, as customers are able to recognize and remember songs they 
hear. 

Furthermore, the present invention facilitates the creation of a rich customer 
information database that may be used to fine tune marketing campaigns by radio 
station and record companies and retailers, such as promotions and targeted 

1 5 marketing. Such data may be useful for the music industry to learn valuable new 
insights on customers' music preferences. The reactions of the users of this service 
to new music will provide valuable predictive information to recording labels and 
musicians attempting to market their recordings. Thus, the present invention 
provides for the first time a direct connection to the heartbeat of the music 

20 consumers for music retailers, record companies and radio stations. 

The present invention provides a service operator with a multitude of 
revenue opportunities. The immediate purchase opportunity for the user afforded by 
the invention could allow music sellers to provide monetary and other incentives to 
the service provider so that the identification service is offered free of charge or at 
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subsidized rate (i.e., a rate that may be less than the cost of providing the service). 
Alternatively, the service could obtain revenue from the price of the telephone call, 
e.g., similar to 900 toll call telephone service, in which the user pays a toll set by 
the 900 number system operator above and beyond the normal toll fee for placing 
5 the call. 

The service may also share in the increased revenue realized by telephone 
companies and internet service providers ("ISPs") as users increase their usage of 
access facilities to use the service. In addition, subscription arrangements with users 
may be established and revenue received for the service's delivery of advertising 

10 and promotional materials to the user. Revenue may also be realized through 

merchandise retailing and fees collected from fulfillment partners. Revenue may 
also be realized by selling information or advertising to companies by leveraging 
the user-specific knowledge acquired through use of the service. 

The present invention employs an audio pattern-recognition technology that 

15 can be used to recognize a sound signal within a short noisy and distorted audio 
sample, which technology includes a robust recognition engine that is context free, 
convenient, fast, and noise immune. The audio pattern recognition technology is 
robust in that it works with highly distorted and noisy signals. In contrast, current 
music recognition engines require high-quality signals. The audio pattern 

20 technology is context free in that it does not rely on the knowledge of the source of 
the broadcasting. Prerecorded sounds can be recognized in circumstances where the 
source is unknown (for examples in bars and restaurants, on the street, in cars, 
airplanes, public transportation, etc.). The audio pattern technology is convenient in 
that excerpts of only a few seconds from any unique portion of the entire song 
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rendition are sufficient to enable recognition. Other techniques require 
comparatively long samples of the music to positively identify it. The audio pattern 
technology of the present invention is fast because it uses an algorithm that can 
search a very large database in a very short period of time. 
5 The present invention overcomes the shortcomings suffered by various other 

known arrangements that attempt to solve the song identification problems 
mentioned above. One such arrangement identifies information (i.e., songs and 
commercials) broadcast by certain major radio systems by relying on a third-party 
identification service and the manual input by the user of data specifying the source 

1 0 of the broadcast information. 

Another known arrangement requires users to carry a keychain, which can 
timestamp the moment a song is played on the radio, and later enable purchases 
when the device is synchronized with an internet-connected computer. Users of 
such arrangements suffer an undesirable delay before learning the identity of the 

15 song, thereby diminishing the power of the impulse purchase phenomenon. In 
addition to the delay limitations, both above-described arrangements can only 
identify songs aired on radio stations. 

Other music recognition arrangements have avoided attempting to perform 
music identification directly. Instead, they rely on contextual information, such as a 

20 timestamp plus radio station in order to look up a playlist provided by a third-party 
source. Such third-party information must be gathered using playlists submitted by 
radio stations, causing dependencies on business relationships, reducing radio 
station participation, or problems when the Disc Jockey changes the actual song 
played at the last minute. Users of certain such arrangements are limited to only the 
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radio stations with which they have registered. Alternatively, third-party playlist 
information can also be provided by a music tracking services, but such services 
usually only track radio broadcasts for the largest radio stations in major 
metropolitan areas. 

5 Another limitation with these known music recognition arrangements is that 

they cannot be used in the conditions under which present invention operates with 
the same effectiveness. Current recognition technologies do not lend themselves to 
robust recognition as they fail under less than ideal conditions such as those 
encountered due to noise, dropouts, interference, bandlimiting, and voice-quality 

10 digital compression. Signals that are transmitted through additional mediums that 
are subject to linear and nonlinear distortion cause search methods that rely on 
cross-correlation or statistical moments to fail. Additionally, the arrangement of the 
present invention can recognize the music with as little as 5 to 15 seconds of 
sampling time, depending on the signal quality and size of the database, potentially 

1 5 allowing recognition and purchasing transactions to be carried out within a one- 
minute telephone call. 

Other known music recognition arrangements embed a perceptually 
inaudible watermark or other side-channel information to identify the music. To 
take advantage of such embedded information the user must have a special 

20 decoding device, perhaps built into the receiver, to obtain the information. Using a 
telephone, or a computer with a web browser, it is possible to obtain play 
information from the radio station. But this requires the user to know which station 
he or she is listening to, as well as how to contact the station. By comparison, the 
method and system of the present invention is context-free, requiring only that the 
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user submit the sampled music information directly for identification. No 
watermarks, side information, special devices, or knowledge of radio station contact 
information are necessary. 

Referring now to FIG 1, there is shown an operational block diagram of an 
5 exemplary embodiment 1 00 according to one aspect of the present invention. A 
signal source 101 in the environment generates a signal 103 that is captured by 
signal capture block 105 and transmitted to signal identification block 1 10 on line 
107. Signal identification block 110 is described in greater detail below. Signal 
identification block 110 identifies the signal 103 from signal source 101 and passes 

10 the identification information (i.e., song ID) to reporting and transaction block 1 15 
via line 1 13. User identification block 120 additionally provides the identity of the 
user (not shown) to the reporting and transaction block 115 on line 118, whereby 
the user may be informed as to the identity of the signal as described more fully in 
the text accompanying FIG 3 below. 

1 5 In some applications of the invention, as discussed further below, it may be 

preferable to have user identification block 120 operationally coupled to, or 
alternatively incorporate, a user database where user data (and data related to other 
user options), among other features, may be read from and written to (not shown in 
FIG 1). The user may optionally carry out transactions regarding this signal identity 

20 as described in greater detail below and in Appendix 1 . Billing data from user 
billing block 125 on line 127 may be used to generate a bill to the user 
commensurate with the extent of the usage of the signal identification service aspect 
of the invention as described herein. Additionally, statistics regarding the signal 
identification, signal identified, and/or the user's usage of the service may be 
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logged by statistical data logger 130. Such data is of interest in market research, and 
has economic value. In such embodiments of the invention, user identification 
information is passed from user identification block 120 to statistical data logger 
130 on line 124 as shown in FIG 1 and reporting and transaction data from 
5 reporting and transaction block 1 15 is passed via line 129. In some applications of 
the invention it may be desirable for statistical data logged by logger 130 to be fed 
to user identification block 120, via line 124, for example to facilitate the update of 
user identification information. 

The signal 103 generated by signal source 101 in FIG 1 may be any kind of 

10 signal in the environment that is a rendition of a signal indexed in a database within 
the signal identification block 110. Examples of signal 103 include recorded music, 
radio broadcast programs, advertisements, and other such signals of interest. 
Accordingly, the signal may be in the form of acoustical waves, radio waves, digital 
audio PCM stream, compressed digital audio stream (such as Dolby Digital or 

15 MP3), internet streaming broadcast, or any other such manner of transmitting such 
• pre-recorded material. Furthermore, signal 103 may be degraded as a result of 
background noise (such as that encountered while in a moving car), talking voices, 
transmission errors and impairments, interference, time warping, compression, 
quantization, filtering, or other such distortions of the original. 

20 Signal capture block 105 captures a sample of signal 103 and provides it in a 

format suitable for processing by the signal identification block 110. Illustrative 
embodiments of signal capture 105 devices include, but are not limited to, 
microphone, telephone, mobile telephone, tape recorder, digital voice recorder, 
answering machine, radio receiver, walkie-talkie, internet streaming socket, voice- 
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over-IP transceiver, or other such signal capturing device. Typically then, the signal 
capture device is incorporated into a device that the user employs in a location 
remote from the service. Conventional devices like mobile and regular telephones, 
PCs, radios, and other recording and communication devices that users already own 
5 or use everyday for other purposes may conveniently be used, without modification, 
to practice the present invention. Upon capture of a sample of signal 103, signal 
capture block 105 delivers the sample to the system via line 107, as indicated. 

User identification block 120 identifies the user to the song recognition 
arrangement of the present invention, and may optionally be operationally coupled 

10 to the signal identification block 1 10 via line 122. Examples of devices which 
generate the appropriate identification for use with user identification block 120 
may include caller ID on a POTS (Plain Old Telephone Service) line or a mobile 
telephone, internet IP address of a terminal sending in the captured signal, or a 
cookie file stored on an internet browser on the user's terminal. In such 

15 implementations, the user's signal capture block 105 stores a user ID and identifies 
the user to the arrangement 100. In another illustrative example of user 
identification block 120, the user may be required to enter an account code, for 
example by keying it in on a touchtone pad on a telephone or saying a pass phrase 
while signing on to a service incorporating the principles of the present invention if 

20 dialing in. Alternatively, the user may be identified by inserting an object carrying 
identification codes into a terminal. Examples of this include a credit card, ATM 
card, or Dallas Semiconductor Java Ring. The user may also be identified by a 
biometric device to scan fingerprints, retinae, palm print, or other such physical 
characteristics of the user. A speaker identification system to identify the user by 
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vocal characteristics is another alternative method. User identification block 120 is 
an optional component of the present arrangement, which is employed if billing and 
tracking of user activity is desired. 

As shown in FIG 4, the elements shown and described in FIG 1 are typically 
5 associated with entities that are independent of one another. Signal source 101 is 
typically associated with a media operator or content provider such as radio or 
television broadcasters, CATV provider, internet service providers, private network 
or LAN operators, and the like. It is important to emphasize that the present 
invention contemplates that the signal source may comprise a live demonstration or 

10 performance, taking place for example, at a nightclub, bar, or discotheque. Signal 
capture block 105 is associated with users, however, such association may be 
merely temporary, as public access devices (e.g., public telephones and internet 
access facilities) may be readily used, without modification, in order to realize the 
benefits provided by the present invention. Signal capture block 105 represents 

15 features and functionalities that, for example, are implemented by the microphone 
and associated transceiver circuits in a user's mobile telephone. 

The remaining elements of FIG 1, are collectively associated as indicated in 
FIG 4 with a service provider. Signal identification block 110, user identification 
block 120, reporting and transaction block 1 15, user billing block 125 and statistical 

20 data logger 130 represent features and functionalities of an integrated system that 
form key elements of a IVR arrangement that may be particularly useful in some 
applications of the invention. In such rVR arrangements, these collected elements 
are typically implemented in a system formed by one or more CPUs. The rVR is 
identified by reference numeral 450 in FIG 4. 
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As depicted in FIG 4, a media provider 410 utilizes signal source 101 which 
transmits signal 103 via media distribution network 420 which may be arranged 
from network and transmission elements or other channelized distribution 
arrangements (as for example, with copper or fiber networks for data or telephony 
5 services) or a free space/transmitter infrastructure array (as for example with radio 
and television broadcasts, satellite systems, and cellular/PCS/GSM wireless 
telephone services or networks which operate according to short-range wireless 
protocols such as the Bluetooth™ wireless standard). Receiver/monitor 440 is 
employed by user 430 to receive signal 103 and transform the signal into a format 

10 that facilitates signal 103 to be monitored by the user. Receiver/monitor 440 may be 
a radio, television, PC, Hi-fi (i.e., stereo) with speakers or any other device that may 
be used to create a media experience (including audio and video) that may be 
monitored by the user. User 430 employing the functionalities of signal capture 
block 105, for example using a mobile telephone, obtains a sample of signal 103 

15 played on receiver/monitor 440, where the sample includes media content of 
interest selected by the user, such as a portion of an unidentified song. Thus, as 
shown in FIG 4, receiver/monitor 440 both outputs a rendition of signal 103 to 
signal capture block 105 and allows the user 430 to monitor signal 103. However, it 
is noted that signal capture block 105 may capture a sample of signal 103 via a 

20 direct connection to media distribution network 420 (i.e., not relying on 

receiver/monitor 440 or similar device for signal input). In such instances, the user 
monitoring of signal 103 is accomplished through other means or user monitoring is 
not performed. 
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The captured sample of signal 1 03 is relayed by the user 430 to the IVR 
450, as shown, via communication network 460 on line 107, as shown. 
Communication network 460 may have a similar arrangement as distribution 
network 420, or may be unitary with distribution network 420. It should be noted, 
5 however, that in certain applications of the invention distribution network 420 may 
typically be characterized by unidirectional signal propagation (as in the case with 
broadcast radio and television or typical CATV headend systems) while 
communication network 460 may typically be characterized by bidirectional signal 
propagation (as in the case with the public switched telephone network and wireless 

10 or wired voice, data, and internet systems). Such bidirectional nature of 

communication network 460 is indicated by signal flow lines 107 and 108 as 
depicted in FIG 4. 

In accordance with the invention, the IVR 450 derives information or 
characteristics of sample of signal 103 including the identification of content 

15 contained therein (for example, the song ID). Such derived information may be 
returned to the user 430 from the IVR 450 using the same communication network 
460 or other networks. The signal return path is indicated with lines 108 in FIG 4. 
And, as described above, the IVR may interact with the user and other entities. For 
illustrative purposes, such interaction pathways are depicted in FIG 4 as lines 489 

20 and 482, input and output, respectively, via alternate network 480. Alternate 

network 480 may be a network of any type, however, in some applications of the 
invention it may be advantageous to employ private networks, dedicated lines, or 
other high-capacity transmission methods should high-bandwidth interactions be 
desired. Such bandwidth intensive interactions could occur, for example, between 
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the service IVR and fulfillment partners such as record distributors. This 
communication path is shown in FIG 4 where fulfillment center 486 interacts with 
IVR 450 using alternate network 480 and fulfills user purchase orders as indicated 
by line 488. Fulfillment center 486 may also interact with IVR 450 using 
5 communication network 460 over interaction pathways 492 and 494 as indicated. 

Thus, FIG 4 may serve to additionally highlight the principles applied in the 
exemplary embodiment of the invention described in the Summary. A radio station 
(media provider 410) employing signal source 101 broadcasts a song (signal 103) 
over the air (media distribution network 420), which is received on a radio 

10 (receiver/monitor 440) of user 430. The song, which is unknown to user 430, is of 
interest. User 430 places a mobile telephone call over a wireless network 
(communication network 460) to the IVR 450. The user 430 positions the 
microphone of his wireless telephone (signal capture device 105) to capture a 
sample of the music being played on the radio. The IVR 450 receives the sample 

1 5 over the wireless network and derives the identity of the song. Optionally, the 

identity of the user may be derived from the user's mobile telephone number that is 
sent from the communication network 460 to the IVR 450 (and user identification 
block 120, in particular) typically during call set up. The song identification is 
returned to back to the user's mobile telephone over the same network. Further 

20 interactions between the user 430 and the IVR 450 may occur, and if the user 

chooses to purchase a recording of the now-identified song, the IVR can send the 
purchase information to the service's distribution facility (fulfillment center 486) 
via data or voice communication using its local area PC network (alternate network 
480) or via data or voice communication over communication network 460 as 
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discussed above. The distribution center sends the selection to the user via 
overnight courier service (line 488). Similarly, alternate network 480 may be 
utilized to send an confirming email of the song identification and purchase, if 
applicable, to the user's email account as described in greater detail above. 
5 Signal identification block 110 (FIG 1) is described in detail in Appendix 1. 

In summary, and referring to FIG 2, the main modules in signal identification block 
1 10 are the landmarking block 210, fingerprinting block 215, song database block 
220, and alignment scan block 225. Landmarking block 210 receives the incoming 
captured sample signal via line 107, as described above, and computes sample 

1 0 • landmark time points. This information is passed, as indicated in FIG 2, to 

fingerprinting block 215 via line 212, as well as the alignment scan block 225 on 
line 214. Fingerprinting block 215 computes fingerprints from the captured sample 
signal at the landmarked time points, generating sample fingerprints. The sample 
fingerprints are output to song database 220 on line 218 and are used to retrieve sets 

1 5 of matching song fingerprints stored in song database 220, these matching song 
fingerprints being associated with song landmark and song ED values. The set of 
retrieved song ID and song landmark values are relayed to the alignment scan block 
225, on lines 226 and 227, respectively, along with the associated sample landmark 
values on line 214 as indicated. Alignment scan block 225 sorts out sets of sample 

20 landmark and song landmark pairs, grouped by common song ID. Each set is 

scanned for linear correspondences in the pairs of landmarks and scored according 
to best fit. The song ED of the set with the highest score is the winning song ID 
which is output from alignment scan 225 on line 113. In an illustrative embodiment 
of the invention, the components forming signal identification block 1 10 are 



30 



WO 02/27600 



PCT/US01/29728 



clustered together in a single computer system, such as an Intel-based PC or other 
workstation. In another illustrative embodiment of the invention, a networked 
cluster of CPUs may be used, with different software modules distributed among 
different processors in order to distribute the computational load. It may be 
5 preferable, in some applications of the invention, to use a cluster of Linux-based 
processors connected by a multi-processing bus architecture or a networking 
protocol such as the Beowulf cluster computing protocol, or a mixture of the two. In 
such an arrangement, song database block 220 is stored in RAM on at least one 
node in the cluster, ensuring that fingerprint searching occurs very rapidly. It is 

10 noted that some computational nodes corresponding to the other functional blocks 
of signal identification block 110, such as landmarking 210, fingerprinting 215, and 
alignment scan 225 may not require as much bulk RAM as the nodes supporting 
song database 220. The number of computational nodes assigned to each module 
may thus be scaled according to need so that no single module becomes a 

15 bottleneck. The computational network is thus, advantageously, highly 
parallelizable and can additionally process multiple simultaneous signal 
identification queries, where such queries are distributed among available 
computational resources. 

In an illustrative embodiment of the invention, some of the functional 

20 modules may be less tightly coupled together than to the other modules. For 

example, the landmarking and fingerprinting functions of blocks 210 and 215 of 
FIG 2 may reside in a physically separate location from the rest of the 
computational devices shown. One such example of physical separation would be 
realized if the landmarking and fingerprinting functional blocks 210 and 212, 
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respectively, are more tightly associated with the signal capture block 105. In such 
arrangement, the landmarking and fingerprinting functions described above are 
realized and incorporated as additional hardware or software embedded in a mobile 
telephone, WAP browser, or other remote terminal, such as the client-end of an 
5 audio search engine. In an internet-based audio search service, such as a content 
identification service, the landmarking and fingerprinting functions may be 
incorporated into the browser application as a linked set of software instructions, or 
as a software plug-in module, such as a Microsoft DLL. In such embodiments 
employing loosely coupled functional modules, the combined signal 

10 capturing/landmarking/fingerprinting functional blocks shown in FIG 2 form the 
client end of the service and send a feature-extracted summary of the captured 
signal, comprised of sample landmark/fingerprint pairs, to the server end (e.g., song 
database 220 in FIG 2). Sending this feature-extracted summary to the server 
instead of the raw captured audio is advantageous since the amount of data is 

1 5 greatly reduced, often by a factor of 1 00 or more. Such feature-extracted summary 
could be sent in real-time over a low-bandwidth side channel along with, or instead 
of, an audio stream transmitted to the server. 

FIG 3 shows details of the identification reporting and transactional aspects 
of the present invention embodied in reporting and transaction block 115 (FIG 1) 

20 and illustrates the highly integrated response mechanism to the user (i.e., a machine 
or person) who requested the signal identification. As indicated in FIG 3, 
identification-reporting control block 310 of reporting and transaction block 115 
receives the identification information (i.e., song ID) from signal identification 
block 1 10 via line 113. Identification and reporting control block 310 also receives 
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user data, illustratively user options and user information signals (collectively 
indicated as lines 1 18 in FIG 3) from user identification block 120 via user database 
312 and line 1 17 as indicated. 

User database 312 is optionally utilized to store detailed user information 
5 and may facilitate the user billing function as described above. Functional block 
314 is interposed between user database 312 and identification reporting control 
block 310, as shown in FIG. 3, to implement data updates to the user database (e.g., 
the user account) on line 316 with such detailed information as a function of the 
input signals 1 1 3, 1 17, and 1 1 8, to identification reporting control block 3 1 0 and 

10 under the control of identification reporting control block 310 via control line 302. 
Thus, user database block 312 operates in a read/write manner. It will be 
appreciated that user database block 312 may be particularly beneficial in some 
applications. Any time a user captures a signal for identification, the service can 
leverage existing, and capture new, data and statistical rules. First, the service can 

15 log the signal identification for the user account, and every subsequent continued 
interaction with the service. Second, the service can use existing data about the user 
account to enhance the experience. This will create a highly personalized 
experience, whether it be custom user account settings and/or preferences, a 
personalized user website, or targeted advertising. 

20 Referring again to FIG 3, identification reporting control block 3 10 is 

operationally coupled via lines 304 and 306 to real-time reporting block 320 and 
offline reporting block 330. The user interaction with the arrangement of the 
present invention may be in real-time or delayed. Real-time reporting provides an 
instant response about the identified signal to the user. This real-time response may 
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be in the form of data or voice. Voice annotation means that the user learns about 
the identified signal by listening to a voice report. After receiving the voice report 
from voice annotation block 324, the user may be provided with additional options 
for further interaction, such as playback of the signal captured and purchase 
5 opportunities of the content identified (i.e., the user may place an order to purchase 
the song or album that was just identified). This interaction is characterized by 
being voice prompted, in that interactive voice response user interface block 322, 
functionally coupled to identification reporting control block 310 through real-time 
reporting block 320 reads out the alternatives, asking the user to respond. The user 

10 provides such responses to the voice prompts provided by the service through 

keypad input or by voice (where in such instances, voice recognition methodologies 
are employed to translate the user's voice responses into usable system inputs). 
Functional blocks 324, 326, and 328 in FIG. 3 illustrate several additional options, 
voice annotation, song excerpt playback, and purchase options, respectively, that 

15 may be offered to the user in accordance with the invention. The purchase options, 
as noted above, provide a solution to the problem of music fans who attempt to 
purchase a particular song recording after hearing it being played or performed (for 
example, from a radio broadcast), but cannot recall the name of the song or artist. In 
accordance with this feature of the invention, the user may immediately purchase 

20 the desired recording via real-time purchase options block 328 immediately after 
receiving the voice report from voice annotation block 324. 

Alternative options are also contemplated in accordance with the invention: 
the user may hear a clip of the identified recorded song (as compared to the 
playback of the captured signal described above); the user may hear more 
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information about the song, artist, or album; the user may hear clips from other 
songs on the same album or from the same artist as the identified song; or, the 
customer may choose to interact with customer service personnel. Song database 
360 is operationally coupled to identification reporting control 310 via line 362 to 
5 implement such contemplated options. Song database 360 may include songs as 
well as related or other data. 

Data responses are used to transmit information about the identified signal 
back to the user through an interface such as a WAP browser interface on mobile 
telephones, or other appropriate protocols over time. In FIG 3, the WAP browser 

10 interface block 340 is operationally coupled on line 338 to real-time reporting block 
320. Thus, the user has the option to interact further with arrangement 100 (FIG 1) 
■ by using such an interface in accordance with the principles of the invention. This 
particular interaction between user and service is characterized by being data 
prompted and does not need to rely upon voice. 

1 5 However, a combination of voice and data is also contemplated as falling 

within the scope of the present invention as shown in FIG 3, where such a 
combination creates a beneficial and seamless experience for the user. The 
technology platform in accordance with the invention advantageously integrates 
both voice and data (respectively through IVR and web-based operations, for 

20 example) so that the user's experience with a service utilizing the principles of the 
invention is consistent and seamless across all interfaces (including those identified 
in operational blocks 322, 340, and 350: interactive voice response interface, WAP 
browser interface, and internet browser interface, respectively). 
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Delayed reporting takes places at a point in time when the user has 
disconnected from the service, i.e., after the step of signal capture shown in signal 
capture block 105 (FIG 1). Delayed reporting (or "offline" reporting), whether 
. voice or data, is accomplished by sending information about the identified signal to 
5 the user via an Internet browser, email message, SMS message or other 

communication methodologies. This feature of the present invention is shown in 
FIG 3 with interface browser interface block 350, SMS text messaging block 360 
and email information block 370 being operationally coupled to offline reporting 
block 330 on lines 332, 334, and 336, respectively. A combination of the three 
1 0 modes of offline reporting is possible and may be preferred in some applications of 
the invention. 

Delayed reporting further may include the option for further interaction with 
a service which utilizes the principles of the present invention, such as playback, 
purchase opportunities, and the like. The blocks operationally coupled and depicted 

1 5 to the right of real-time reporting block 320 and offline reporting block 330 in FIG 
3 thus represent data and/or signal outputs from reporting and transaction block 115 
to users or other constituencies. More particularly, with respect to real-time 
reporting, interactive voice response user interface block 322 provides output data 
from voice annotation block 324, song excerpt playback block 326, and real-time 

20 purchase options block 328 back to the user (in this illustrative example of the 

invention via the return channel of the duplex call on the user's mobile telephone) 
as shown on line 372 of FIG 3. Similarly, WAP browser interface block 340 and 
online information browsing options block 342 provide interactive data output to 
the user on line 374. 
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With respect to offline reporting, internet browser interface block 350 and 
online purchase options block 352 provide output on line 376 while SMS text 
messaging block 360 and email information block 370 provide output data, to the 
mobile telephone user in this illustrative example of the invention, via lines 378 and 
5 380 in FIG. 3. Output from reporting and transaction block 1 1 5 is also directed to 
statistical data logger 130 (FIG 1). 

As with real-time purchase option block 328 in FIG 3, WAP browser 
interface 340 and internet browser interface 350 are operationally coupled to online 
purchase options block 352 on lines 341 and 351, respectively. Online purchase 

10 options block 352 may implement the same type of functionalities and alternatives 
options discussed when describing real-time purchase options block 328 above. 
Similarly, online information browsing options block 342 is cross coupled to 
receive input from internet browser interface block 350 and WAP browser interface 
block 340 on lines 343 and 353, respectively. 

1 5 A combination of real-time and offline reporting may be advantageously 

employed in some applications of the invention. In such combinations, the user 
distributes interactions with the service or particular aspects of interactions using 
both the real-time and offline interfaces over a period of time or at the same time. 
For example, a user may place an order for an identified song over the phone (the 

20 real-time interface), and then provide payment and delivery information for that 
order later through the service's website (the offline interface). Likewise, a user 
may wish to interact with the IVR in real-time over the phone while simultaneously 
interacting with the service to arrange for SMS message about a particular song to 
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be sent to a friend's mobile phone. Of course, other beneficial combinations may be 
readily arranged using the principles of the present invention. 

Online information browsing options block 342 may be used in certain 
applications of the invention to implement a number of desirable functionalities and 
5 features. For example, a user using a WAP or Internet browser could access a 
service provider's website which utilizes the features provided by online 
information browsing options block 342 in FIG 3 in accordance with the invention, 
to recommend songs to friends, chat with other service users, play games (e.g., a 
• game where users try to identify obscure song tracks sampled by other users), and 
10 other activities that are facilitated by internet's large reach. In addition, information 
browsing options block 342 may be used to implement the delivery of promotional 
materials (such as clips from albums) and special event tickets or merchandise, or 
manage archived data selected by that user such as sample and "wish" lists. 
Information browsing options block 342 may also be used to implement an 
1 5 interaction with the user to manage or search for other information. 

The present invention also includes features implemented using 
information-browsing options block 342 to allow users to set up alerts for news 
releases and concert announcements. Users could also interact with a service 
utilizing this feature of the invention to send information to friends (via SMS or 
20 email, for example) on the types of music they like, or send an audio clip directly to 
the friend's mobile telephone. A wide range of services and features, in fact, may 
be readily implemented using the data response features of the present invention. 

While the present invention contemplates that reporting, both real-time and 
delayed, is primarily targeted to the user who requested identification, it may be 
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desirable in some application of the inventions to include delayed reporting to other 
constituencies. This reporting occurs via a service utilizing the principles of the 
present invention and not directly from the user. For example, a person who 
identifies a song, can request the service to send an SMS or email message about 
5 the identification to other people as well. 

Turning again to the present illustrative example of the invention where a 
user is located in a high noise environment such as an automobile, interactive voice 
response user interface block 322, in accordance with the invention, is utilized to 
minimize human intervention. First time users may interact with a human operator, 

10 in this illustrative example, using a mobile telephone, to register, for example, their 
email address, user profile, and other details relating to unregistered users who may 
wish to make purchases. 

Communications may also be made with the user through email and website 
interactions. After the initial registration, a telephone interface enables users to dial 

1 5 a particular number to initiate a call to the service's IVR. The IVR identifies the 
caller through the user's mobile telephone number in accordance with operations of 
functional block 120 (FIG. 1). The user can then record a music sample, which in 
this example is being played on the user's car radio, which is captured and 
recognized (i.e., identified) as described above. 

20 With the exception of the first time customer registration, calls with not 

require human intervention. The user may chose to receive a voice report relayed 
back on the mobile which provides the desire song identification from voice 
annotation block 324 (FIG 3), or optionally, the user may receive an email with the 
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name of the track, artist and album title, and links to the service provider's website 
which facilitates the user accessing a range of services as described above. 

In another illustrative example of the invention, a user is listening to radio or 
television programming while working at a computer workstation. After an initial 
5 registration process similar to that described above (which may be accomplished 
over the telephone as with the previous example, or via online registration), the user 
may record a music sample using the computer's microphone and then access the 
service provider's facilities via website or dial up to allow the sample to be 
uploaded as a sound file, captured, and identified in accordance with the present 
10 invention. In this example, identification of the user may be effected through the 
user's BP address. Other features and functions in this example are similar in scope 
and operation to those described in the example above. 

Although various embodiments are specifically illustrated and described 
' herein, it will be appreciated that modifications and variations of the invention are 
15 covered by the above teachings and within the purview of the appended claims 

without departing from the spirit and intended scope of the invention. For example, 
while several of the embodiments depict the use of specific communication 
techniques and protocols between various embodiments, any communication 
technique will suffice to transfer information between the two devices. Moreover, 
20 while some of the embodiments describe specific recording formats, any data and 
information format for transferring data to the user may be employed by the 
invention described herein. Furthermore, these examples should not be interpreted 
to limit the modifications and variations of the invention covered by the claims but 
. are merely illustrative of possible variations. 
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APPENDIX 1 
Abstract 

We disclose a method and apparatus for recognizing sound, music, and 
other similar signals. The disclosed invention is capable of recognizing an 

5 exogenous sound signal that is a rendition of a known recording indexed in a 
• database. The exogenous sound signal may be subjected to distortion and 
interference, including background noise, talking voices, compression artifacts, 
band-limited filtering, transmission dropouts, time warping, and other linear and 
nonlinear corruptions of the original signal. The algorithm is capable of identifying 

10 the corresponding original recording from a large database of recordings in time 
proportional to the logarithm of the number of entries in the database. Given 
sufficient computational power the system can perform the identification in nearly 
realtime, i.e. as the sound is being sampled, with a small lag. 



15 Database construction 

The sound database may consist of any collection of recordings, such 
speech, music, advertisements, or sonar signatures. 



Indexing 

20 In order to index the sound database, each recording in the library is 

subjected to landmarking and fingerprinting analysis to generate an index set for 
each item. Each recording in the database has a unique index, sound_TD. 
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Landmarking 

Each sound recording is landmarked using methods to find distinctive and 
reproducible locations within the sound recording. The ideal landmarking algorithm 
will be able to mark the same points within a sound recording despite the presence 
5 of noise and other linear and nonlinear distortion. The landmarking method is 
conceptually independent of the fingerprinting process, but may be chosen to 
optimize performance of the latter. Landmarking results in a list of timepoints 
{landmark^ within the sound recording at which fingerprints should be calculated. 
A good landmarking scheme marks about 5-10 landmarks per second of sound 
10 recording, of course depending on the amount of activity within the sound 
recording. 

Power Norms 

A simple landmarking technique is to calculate the instantaneous power at 
15 every timepoint and to select local maxima. One way of doing this is to calculate 
the envelope by rectifying and filtering the waveform directly. Another way is to 
calculate the Hilbert transform (quadrature) of the signal and use the sum of the 
magnitudes squared of the Hilbert transform and the original signal. 

20 Spectral Lp Norms 

The power norm method of landmarking is especially good for finding 
transients in the sound signal. The power norm is actually a special case of the more 
general Spectral Lp Norm, where p=2. The general Spectral Lp Norm is calculated 
at each time along the sound signal by calculating the spectrum, for example via a 
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Hanning-windowed Fast Fourier Transform (FFT). The Lp norm for that time slice 
is then calculated as the sum of the p-th power of the absolute values of the spectral 
components, optionally followed by taking the p-th root. As before, the landmarks 
are chosen as the local maxima of the resulting values over time. 

5 

Multislice landmarks 

Multi-slice landmarks may be calculated by taking the sum of p-th powers 
of absolute values of spectral components over multiple timeslices instead of a 
single slice. Finding the local maxima of this extended sum allows optimization of 
10 placement of the multislice fingerprints, described below. 

Fingerprinting 

The algorithm computes a fingerprint at each landmark timepoint in the 
recording. The fingerprint is generally a value or set of values that summarize a set 
15 of features in the recording near the timepoint. In our implementation the 

fingerprint is a single numerical value that is a hashed function of multiple features. 

The following are a few possible fingerprint categories. 

20 Salient Spectral Fingerprints 

In the neighborhood of each landmark timepoint a frequency analysis is 
performed to extract the top several spectral peaks. A simple such fingerprint value 
is just the single frequency value of the strongest spectral peak. The use of such a 
simple peak resulted in surprisingly good recognition in the presence of noise, but 
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resulted in many false positive matches due to the non-uniqueness of such a simple 
scheme. Using fingerprints consisting of the two or three strongest spectral peaks 
. resulted in fewer false positives, but in some cases created a susceptibility to noise 
if the second-strongest spectral peak was not sufficiently strong enough to 
5 distinguish it from its competitors in the presence of noise - the calculated 

fingerprint value would not be sufficiently stable. Despite this, the performance of 
this case was also good. 

Multislice Fingerprints 

1 0 In order to take advantage of the time-evolution of many sounds a set of 

timeslices is determined by adding a set of offsets to a landmark timepoint. At each 
resulting timeslice a Salient Spectral Fingerprint is calculated. The resulting set of 
fingerprint information is then combined to form one multitone fingerprint. Each 
such fingerprint is much more unique than the single-time salient spectral 

15 fingerprint since it tracks temporal evolution, resulting in fewer false matches. Our 
experiments indicate that using two or three timeslices along with the single 
strongest spectral peak in each timeslice results in very good performance, even in 
the presence of significant noise. 

20 LPC Coefficients 

In addition to finding the strongest spectral components, there are other 
spectral features that can be extracted and used as fingerprints. LPC analysis 
extracts the linearly predictable features of a signal, such as spectral peaks, as well 
as spectral shape. LPC coefficients of waveform slices anchored at landmark 
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positions can be used as fingerprints by hashing the quantized LPC coefficients into 
an index value. LPC is well-known in the art of digital signal processing. 



Cepstral Coefficients 

5 Cepstral coefficents are useful as a measure of periodicity and may beused 

to characterize signals that are harmonic, such as voices or many musical 
instruments. A number of cepstral coefficients may hashed together into an indes 
and used as a fingerprint. Cepstral analysis is well-known in the art of digital signal 
processing. 

10 

Index Set 

The resulting index set for a given sound recording is a list of pairs 
(fingerprint, landmark) of analyzed values. Since the index set is composed simply 
of pairs of values, it is possible to use multiple landmarking and fingerprinting 

15 schemes simultaneously. For example, one landmarking/fingerprinting scheme may 
' be good at detecting unique tonal patterns, but poor at identifying percussion, 
whereas a different algorithm may have the opposite attributes. Use of multiple 
landmarking/fingerprinting strategies results in a more robust and richer range of 
recognition performance. Different fingerprinting techniques may be used together 

20 by reserving certain ranges of fingerprint values for certain kinds of fingerprints. 
For example, in a 32-bit fingerprint value, the first 3 bits may be used to specify 
which of 8 fingerprinting schemes the following 29 bits are encoding. 
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Searchable Database 

Once the index sets have been processed for each sound recording in the 
database, a searchable database is constructed in such a way as to allow fast (log- 
time) searching. This is accomplished by constructing a list of triplets (fingerprint, 
5 landmark, soundJDD), obtained by appending the corresponding soundJOD to each 
doublet from each index set. All such triplets for all sound recordings are collected 
into a large index list. In order to optimze the search process, the list of triplets is 
then sorted according to the fingerprint. Fast sorting algorithms are well-known in 
the art and extensively discussed in D.E. Knuth, "The Art of Computer 

10 Programming, Volume 3: Sorting and Searching," hereby incorporated by 

reference. High-performance sorting algorithms can sort the list in N log(N) time, 
where N is the number of entries in the list. Once this list is sorted it is further 
processed by segmenting it such that each unique fingerprint in the list is collected 
into a new master index list. Each entry in this master index list contains a 

1 5 fingerprint value and a pointer to a list of (landmark, soundJOD) pairs. Rearranging 
the index list in this way is optional, but saves memory since each fingerprint value 
only appears once. It also speeds up the database search since the effective number 
of entries in the list is greatly reduced to a list of unique values. 

Alternatively, the master index list could also be constructed by inserting 

20 each triplet into a B-tree with non-unique fingerprints hanging off a linked list. 

Other possibilites exist for constructing the master index list. The master index list 
is preferably held in system memory, such as DRAM, for fast access. 
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Recognition system 

Once the master index list has been built it is possible to perform sound 
recognition over the database. 



5 Sound source 

Exogenous sound is provided from any number of analog or digital sources, 
such as a stereo system, television, Compact Disc player, radio broadcast, 
telephone, mobile phone, internet stream, or computer file. The sounds may be 
realtime or offline. They may be from any kind of environment, such as a disco, 
10 pub, submarine, answering 

machine, sound file, stereo, radio broadcast, or tape recorder. Noise may be present 
in the sound signal, for example in the form of background noise, 
talking voices, etc. 



1 5 Input to the recognition system 

The sound stream is then captured into the recognition system either in 
realtime or presented offline, as with a sound file. Realtime sounds may be sampled 
digitally and sent to the system by a sampling device such as a microphone, or be 
stored in a storage device such as an answering machine, computer file, tape 

20 recorder, telephone, mobile phone, radio, etc. The sound signal may be subjected to 
further degradation due to limitations of the channel or sound capture device. 
Sounds may also be sent to the recognition system via an internet stream, FTP, or as 
a file attachment to email. 
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Preprocessing 

Once the sound signal has been converted into digital form it is processed 
for recognition. As with the construction of the master index list, landmarks and 
fingerprints are calculated. In fact, it is advisable to use the very same code that was 
5 used for processing the sound recording library to do the landmarking and 

fingerprinting of the exogenous sound input. The resulting index set for exogenous 
sound sample is also a list of pairs (fingerprint,landmark) of analyzed values. 

Searching 

1 0 Searching is carried out as follows: each fingerprint/landmark pair 

(fingerprint k ,landmark k ) in the resulting input sound's index set is processed by 
searching for fingerprintk in the master index list. Fast searching algorithms on an 
' ordered list are well-known in the art and extensively discussed in Knuth, Volume 3 
(ibid), incorporated by reference. If fingerprintk is found then the corresponding list 

1 5 of matching (landmark j, sound_IDj) pairs having the same fingerprint is copied and 
augmented with landmarkk to form a set of triplets of the form(landmarkk, 
landmark*j,sound_IDj). This process is repeated for all k ranging over the input 
sound's index set, with the all the resulting triplets being collected into a large 
candidate list. 

20 After the candidate list is compiled it is further processed by segmenting 

according to sound_ID. A convenient way of doing this is to sort the candidate list 
according to sound_ID 5 or by insertion into a B-tree. The result of this is a list of 
candidate soundJDs, each of which having a scatter list of pairs of landmark 
timepoints, (landmark k ,landmark*j) with the sound_ID stripped off. 
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Scanning 

The scatter list for each soundJD is analyzed to determine whether it is a 
likely match. 

5 ■ Thresholding 

One way to eliminate a large number of candidates is to toss out those 
having a small scatter list. Clearly, those having only 1 entry in their scatter lists 
cannot be matched. 

10 Alignment 

A key insight into the matching process is that the time evolution in 
matching sounds must follow a linear correspondence, assuming that the timebases 
on both sides are steady. This is almost always true unless thesound on one side has 
been nonlinearly warped intentionally or subject to defective playback equipment 
1 5 such as a tape deck with a warbling speed problem. Thus, the matching fingerprints 
yielding correct landmark pairs (landmark n ,landmark* n ) in the scatter list of a given 
sound_ID must have a linear correspondence of the form 

landmark „ = m*landmark„ + offset 

20 

where m is the slope, and should be near 1, landmark n is the corresponding 
timepoint within the exogenous sound signal,landmark * n is the corresponding 
timepoint within the library sound recording indexed by sound ID, and offset is the 
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lime offset into the library sound recording corresponding to the beginning of the 
exogenous sound signal. 

This relationship ties together the true landmark/fingerprint correspondences 
between the exogenous sound signal and the correct library sound recording with 
5 high probability, and excludes outlier landmark pairs. Thus, the problem of 
determining whether there is a match is reduced to finding a diagonal line with 
slope near 1 within the scatterplot of the points in the scatter list. 

There are many ways of finding the diagonal line. A preferred method starts 
by subtracting m*landmark n from both sides of the above equation. 

10 

(landmark* n - m*landmark n ) = offset 

Assuming that m is approximately 1, we arrive at 

15 (landmark n - landinark n ) = offset 

The diagonal-finding problem is then reduced to finding multiple landmark pairs 
that cluster near the same offset value. This is accomplished easily by calculating a 
histogram of the resulting offset values and searching for the offset bin with the 
20 highest number of points. Since the offset must be positive if the exogenous sound 
signal is fully contained within the correct library sound recording, landmark pairs 
that result in a negative offset are excluded. 

The winning offset bin of the histogram is noted for each qualifying 
sound lD, and the corresponding score is the number of points in the winning bin. 



50 



WO 02/27600 



PCT/US01/29728 



The sound recording in the candidate list with the highest score is chosen as the 
winner. The winning soundID is provided to an output means to signal the success 
of the identification. 

To prevent false identification, a minimum threshold score may be used to 
5 gate the success of the identification process. If no library sound recording meets 
the minimum threshold then there is no identification. 

Pipelined recognition 

In a realtime system the sound is provided to the recognition system 

10 • incrementally over time. In this case it is possible to process the data in chunks and 
to update the index set incrementally. Each update period the newly augmented 
index set is used as above to retrieve candidate library sound recordings using the 
searching and scanning steps above. The advantage of this approach is that if 
sufficient data has been collected to identify the sound recording unambiguously 

1 5 then the data acquisition may be terminated and the result may be announced. 

Reporting the result 

Once the correct sound has been identified, the result is reported. Among the 
result-reporting means, this may be done using a computer printout, email, SMS 
20 text messaging to a mobile phone, computer-generated voice annotation over a 
telephone, posting of the result to an internet account which the user can access 
later. 
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WHAT IS CLAIMED IS: 



1 1. A method for identifying recorded sound signals comprising the steps of: 

2 a) receiving a sound stream containing at least a portion of an unidentified 

3 recorded sound; 

4 b) processing the received sound stream to construct a set of values 

5 corresponding to characteristics of the unidentified recorded sound; 

6 c) searching a library having a plurality of sets of values corresponding to 

7 characteristics of a plurality of identified recorded sounds; and 

8 d) comparing the constructed set of values to one or more sets of values in the 

9 library. 

1 2. The method according to claim 1, further including a step of selecting an 

2 identified recorded sound to the unidentified recorded sound based on a result of the 

3 comparing step c). 

1 3. The method according to claim 2, further including a step of returning 

2 information relating to the selected identified recorded sound to the user. 

1 4. The method according to claim 1, wherein the unidentified recorded sound is 

2 provided to the user over a transmission medium. 
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1 5. The method according to claim 4, wherein the transmission medium includes 

2 acoustic waves. 

1 6. The method according to claim 4, wherein the transmission medium includes 

2 radio waves. 

1 7. The method according to claim 4, wherein the transmission medium includes 

2 digital audio streams. 

1 8. The method according to claim 4, wherein the transmission medium includes 

2 PCM streams. 

1 9. The method according to claim 4, wherein the transmission medium includes 

2 . internet streaming broadcasts. 

1 10. The method according to claim 4, wherein the digital audio stream is 

2 compressed. 

1 11. The method according to claim 1 0, wherein the compressed digital audio 

2 stream is a Dolby Digital or MP3 audio stream. 
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1 12. The method according to claim 3, wherein the information is returned over 

2 a first delivery channel, the first delivery channel including a user interface selected 

3 from the group consisting of real-time reporting interfaces and offline reporting 

4 interfaces. 

1 13. The method according to claim 12, wherein the real-time reporting interface 

2 is an interactive voice response interface. 

1 14. The method according to claim 12, wherein the reported information is 

2 provided to the user as a voice report. 

1 15. The method according to claim 12, wherein the real-time reporting interface 

2 is a WAP browser interface. 

1 16. The method according to claim 12, wherein the offline reporting interface is 

2 an internet browser interface. 

1 17. The method according to claim 1, wherein the sound stream is received over 

2 a second delivery channel selected from the group consisting of POTS lines, wireless 

3 cellular, wireless PCM, GSM, internet, radio, satellite, and a network. 

1 18. The method according to claim 17, wherein the network includes a local 

2 area network. 
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1 19. The method according to claim 17, wherein the network includes an 

2 Ethernet network. 

1 20. The method according to claim 17, wherein the network includes one or 

2 more private networks . 

1 21. The method according to claim 1 7, wherein the network includes a cable 

2 network. 

1 22. The method according to claim 17, wherein the network operates according 

2 to a short-range wireless protocol. 

1 23. The method according to claim 22, wherein short-range wireless protocol 

2 includes the Bluetooth wireless standard. 

1 24. The method according to claim 1, wherein the sound stream is captured by a 

2 capture device operated by a user 

1 25. The method according to claim 24, wherein the capture devices includes a 

2 telephone. 
3 

1 
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2 26. The method according to claim 24, wherein the capture devices includes a 

3 mobile telephone. 

1 27. The method according to claim 24, wherein the capture devices includes a 

2 tape recorder. 

1 28. The method according to claim 24, wherein the capture devices includes a 

2 digital voice recorder. 

1 29. The method according to claim 24, wherein the capture devices includes an 

2 answering machine. 

1 30. The method according to claim 24, wherein the capture devices includes a 

2 radio receiver. 

1 31. The method according to claim 24, wherein the capture devices includes a 

2 walkie-talkie. 

1 32. The method according to claim 24, wherein the capture devices includes an 

2 internet streaming socket. 

1 33. The method according to claim 24, wherein the capture devices includes a 

2 voice-over-IP transceivers. 
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1 34. The method according to claim 2, wherein the result is obtained by 

2 determining a score associated with the constructed set of values and one or more sets 

3 of values in the library. 

1 35. A method for providing a transaction to a user exposed to a media stream, 

2 the method comprising the steps of: 

3 a) receiving a signal including a captured sample of media stream from the 

4 user; 

5 b) determining from the signal a characteristic of the captured sample; and 

6 c) triggering a predetermined transaction with the user in response to the 

7 determined characteristic. 

1 36. The method according to claim 35, wherein the predetermined transaction 

2 includes sales and purchase of merchandise. 

1 37. The method according to claim 35, wherein the predetermined transaction 

2 includes an offer for sale of merchandise. 

1 38. The method according to claim 37, wherein the offer for sale of 

2 merchandise includes an offer to sell recordings of music. 
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1 39. The method according to claim 3 8, wherein the recording is related to a 

2 characteristic of the captured sample. 

1 40. The method according to claim 35, wherein the predetermined transaction 

2 includes furnishing and receiving information. 

1 41 . The method according to claim 35, wherein the predetermined transaction 

2 includes delivery of advertising or promotional offers. 

1 42. The method according to claim 41, wherein the promotional offers include 

2 trial offers. 

1 43. The method according to claim 41, wherein the promotional offers include 

2 offers to sell merchandise or services at discounted prices. 

1 44. The method according to claim 35, wherein the predetermined transaction 

2 includes an exchange of information between a sales source and the user attendant to a 

3 sale of merchandise or services to the user. 

1 45. The method according to claim 37, wherein the offer is selected in response 

2 to a profile of the user. 
1 
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2 46. The method according to claim 37, wherein the offer is selected in response 

3 to history of transactions completed with the user. 

1 47. A method for identifying music to a user comprising: 

2 a) receiving a signal including a captured sample of the music from the user; 

3 b) determining from the signal a characteristic of the captured sample; 

4 c) comparing the characteristic of the captured sample to a characteristic 

5 associated with an identity records contained in a database; and 

6 d) locating an identity record corresponding to the captured sample according 

7 to a result of the comparison. 

1 48. The method according to claim 47, wherein the music is received by the 

2 user via a radio broadcast and the captured sample includes a sample of the radio 

3 broadcast. 

1 49. The method according to claim 48, further including returning the identity 

2 record to the user. 

1 50. The method according to claim 48, further including offering to sell to the 

2 user a recording including at least a song which corresponds to the located identity 

3 record. 
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1 51. The method according to claim 48, further including offering to provide to 

2 the user information relating to the located identity record. 

1 52. The method according to claim 48, further including a step of playing a 

2 recording of a song corresponding to the located identity record to the user. 

1 53. The method according to claim 48, further including a step of offering to 

2 sell merchandise. 

1 54. The method according to claim 53, wherein the merchandise relates to the 

2 located identity record. 

1 55. The method according to claim 48, further including offering to sell live 

2 performance tickets. 

1 56. The method according to claim 55, wherein the live performance tickets 

2 relate to the located identity record. 

1 57. The method according to claim 48, further including offering to sell record 

2 albums to be released at a future time. 

1 58. The method according to claim 48, further including offering to provide 

2 information pertaining to a location of retail music establishments. 
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1 59. The method according to claim 58, wherein the information further includes 

2 ' information pertaining to a location of retail music establishments that are in close 

3 proximity to the user. 

1 60. The method according to claim 48, further including downloading media to 

2 a user device. 

1 61. The method according to claim 60, wherein the downloaded media includes 

2 a pre-recorded song corresponding to the located identity record. 

1 62. The method according to claim 60, wherein the user device is selected from 

2 . the group consisting of PCs, PDAs, internet access devices, wireless internet devices, 

3 mobile telephones, wireless information devices, and pagers. 

1 63. The method according to claim 48, further including receiving commands 

2 from the user in response to the returned identity record. 

1 64. The method according to claim 63, further including performing an 

2 additional predetermined step in response to the command. 

1 65. The method according to claim 64, wherein the predetermined step includes 

2 delivering a message to a third party. 
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1 66. The method according to claim 65, wherein the message includes a 

2 recommendation of music corresponding to the located identity record. 

1 67. The method according to claim 64, wherein the predetermined step includes 

2 a collection of data indicative of music popularity. 

1 68. The method according to claim 67, wherein the collected data includes data 

2 received from the user. 

1 69. The method according to claim 64, wherein the predetermined step includes 

2 playing additional songs not associated with the located identity record to the user. 

1 70. The method according to claim 64, wherein the predetermined step includes 

2 locating one or more music performance artists matching a predetermined criterion. 

1 71. The method according to claim 70, wherein the criterion includes similarity 

2 of the one or more music performance artists to an artist associated with the located 

3 identity record. 

1 72. The method according to claim 64, wherein the predetermined step includes 

2 providing a critical review of a music performance artist associated with the located 

3 identity record. 
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1 73. The method according to claim 64, wherein the predetermined step includes 

2 providing critical reviews of a record album containing a song corresponding to the 

3 located identity record. 

1 74. The method according to claim 64, wherein the predetermined step includes 

2 providing information pertaining to popularity of a song or music performance artist 

3 associated with the located identity record. 

1 75. The method according to claim 64, wherein the predetermined step includes 

2 delivering information to the user. 

1 76. The method according to claim 75, wherein the information pertains to the 

2 located identity record. 

1 77. The method according to claim 75, wherein the information is delivered in 

2 an SMS format. 

1 78. The method according to claim 75, wherein the information pertains to new 

2 album releases. 

1 79. The method according to claim 76, wherein the information pertains to 

2 scheduling of concerts. 
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1 80. The method according to claim 79, wherein the concert is related to the 

2 located identity record. 

1 81 . The method according to claim 48, further including storing the captured 

2 sample. 

1 82. The method according to claim 64, wherein the predetermined step includes 

2 delivering an excerpt of a recording of a song corresponding to the located identity 

3 . record. 

1 83. The method according to claim 72, wherein the excerpt is delivered to the 

2 user. 

1 84. The method according to claim 72, wherein the excerpt is delivered to a 

2 third party. 

1 85. A method for identifying music to a user exposed to a broadcast that 

2 includes unidentified music, the method comprising: 

3 a) receiving a signal including a captured sample of the broadcast from the 

4 user; 

5 b) determining from the signal a characteristic of the captured sample; 
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6 c) comparing the characteristic of the captured sample to a characteristic 

7 associated with an identity record contained in a database; 

8 d) attempting to locate an identity record corresponding to the captured sample 

9 according to a result of the comparison; and 

10 e) storing the captured sample if the location attempt is unsuccessful. 

1 86. The method according to claim 85, further including delivering the captured 

2 sample to remote locations. 

1 87. The method according to claim 85, wherein the delivered captured samples 

2 are used in games or contests involving attempts to identify the unidentified music. 

1 88. A method for identifying music to a user exposed to a broadcast, which 

2 includes unidentified music, the method comprising the steps of: 

3 a) receiving a signal including a captured sample of the broadcast from the 

4 user; 

5 b) determining from the signal a characteristic of the captured sample; 

6 c) comparing the characteristic of the captured sample to a characteristic 

7 associated with an identity record contained in a database; 

8 d) attempting to locate an identity record corresponding to the captured sample 

9 according to a result of the comparison; and 

10 e) providing an interactive interface for the user to store manipulate data 

1 1 associated with a successfully located identity record. 

65 



WO 02/27600 



PCT/US01/29728 



1 89. The method according to claim 88, wherein the interface is selected from 

2 the group consisting of real-time interfaces, offline interfaces, and combinations 

3 thereof. 

1 90. The method according to claim 89, wherein the offline interface is selected 

2 from the group consisting of internet browsers, email, SMS messaging, and 

3 combinations thereof. 

1 91. The method according to claim 88, wherein the interface is arranged to 

2 allow the user to store, retrieve and forward the data. 

1 92. The method according to claim 88, wherein the interface is arranged to 

2 allow the user to communicate with third parties. 

1 93. The method according to claim 88, wherein the interface is arranged to 

2 allow the user to participate in games or contests. 

1 94. The method according to claim 93, wherein the games or contests include 

2 identifying unidentified songs. 

1 95. The method according to claim 88, wherein the interface is arranged to 

2 allow the user to forward data to a website. 
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1 96. The method according to claim 95, wherein the website is configured to 

2 provide personalized radio station services to the user. 

1 97. An apparatus for identifying music to a user exposed to a broadcast that 

2 includes music unidentified to the user comprising: 

3 a) a receiver arranged to receive a signal including a captured sample of the 

4 broadcast from the user; 

5 b) a signal analyzer for determining from the signal a characteristic of the 

6 captured sample; 

7 c) a database containing a library of identity records; and 

8 d) a comparator that compares the determined characteristic to characteristics 

9 associated with identity records contained in the database for locating an 
1 0 identity record that matches the captured sample. 

1 98. The apparatus according to claim 97, further including a transmitter for 

2 transmitting information related to the located identity record to the user. 

1 99. The apparatus according to claim 97, further including an interactive voice 

2 response unit. 

1 TOO. The apparatus according to claim 99, wherein the interactive voice 

2 response unit is operated from a script which includes an offer to sell to the user a 
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3 music recording including at least a song which corresponds to the located identity 

4 record. 

1 101. The apparatus according to claim 99, wherein the interactive voice 

2 response unit is operated from a script which includes an offer to provide to the user 

3 information relating to the located identity record. 

1 102. The apparatus according to claim 97, further including a music player for 

2 playing a recording of a song corresponding to the located identity record to the user. 

1 103. The apparatus according to claim 97, further including a music downloader 

2 for downloading a recording of a song corresponding to the located identity record to 

3 the user. 

1 104. The apparatus according to claim 103, wherein the downloader downloads 

2 the recording to a user device. 



1 

2 
3 



105. The apparatus according to claim 104, wherein the user device is selected 
from the group consisting of PCs, PDAs, internet access devices, wireless internet 
• devices, mobile telephones, wireless information devices, and pagers. 
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1 106. A method for identifying recorded sound signals comprising: 

2 a) receiving a signal including a captured sample of an audio signal; 

3 b) deriving from the signal a characteristic of the captured sample; 

4 c) comparing the characteristic of the captured sample to a characteristic 

5 associated with a stored identity record of identified sound signals contained 

6 in a database; and 

7 d) determining a probability of a match between the audio signal and a stored 

8 identity record using the results of the comparing step c). 

9 . 

10 107. A method for detecting use of copyrighted audio media, the method 

1 1 comprising: 

12 a) receiving a signal including a captured sample of an audio signal; 

13 b) deriving from the signal a characteristic of the captured sample; 

14 c) comparing the characteristic of the captured sample to a characteristic 

15 associated with a stored identity record of copyrighted audio media 

16 contained in a database; and 

17 d) determining a probability of a match between the audio signal and a stored 

1 8 identity record using the results of the comparing step c). 
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